UbiCrawler: a scalable fully distributed Web crawler
نویسندگان
چکیده
We present the design and implementation of UbiCrawler, a scalable distributed web crawler, and we analyze its performance. The main features of UbiCrawler are platform independence, fault tolerance, a very effective assignment function for partitioning the domain to crawl, and more in general the complete decentralization of every task.
منابع مشابه
Performance Modeling of a Distributed Web Crawler using Stochastic Activity Networks
One of the basic requirements of Web mining is a crawler system, which collects the information from the Web. To predict the performance, dependability and other operational measures of a system, it is required to construct and evaluate a formal model of the system. We have constructed a formal model for a distributed crawler, which is based on UbiCrawler, using stochastic activity networks (SA...
متن کاملDesign and Implementation of an Efficient Distributed Web Crawler with Scalable Architecture
Distributed Web crawlers have recently received more and more attention from researchers. Centralized solutions are known to have problems like link congestion, being a single point of failure ,while the fully distributed crawlers become an interesting architectural paradigm for its scalability, increased autonomy of nodes. This paper provides a distributed crawler system which consists of mult...
متن کاملTrovatore: Towards a Highly Scalable Distributed Web Crawler
Trovatore is an ongoing project aimed at realizing an efficient distributed and highly scalable web crawler. This poster illustrates the main ideas behind its design.
متن کاملDesign and Implementation of Scalable, Fully Distributed Web Crawler for a Web Search Engine
The Web is a context in which traditional Information Retrieval methods are challenged. Given the volume of the Web and its speed of change, the coverage of modern web search engines is relatively small. Search engines attempt to crawl the web exhaustively with crawler for new pages, and to keep track of changes made to pages visited earlier. The centralized design of crawlers introduces limita...
متن کاملA Scalable, Distributed Web-Crawler*
In this paper we present a design and implementation of a scalable, distributed web-crawler. The motivation for design of such a system to effectively distribute crawling tasks to different machined in a peer-peer distributed network. Such architecture will lead to scalability and help tame the exponential growth or crawl space in the World Wide Web. With experiments on the implementation of th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Softw., Pract. Exper.
دوره 34 شماره
صفحات -
تاریخ انتشار 2004